Static LU Decomposition on Heterogeneous Platforms

نویسندگان

  • Olivier Beaumont
  • Arnaud Legrand
  • Fabrice Rastello
  • Yves Robert
چکیده

In this paper, the authors deal with algorithmic issues on heterogeneous platforms. They concentrate on dense linear algebra kernels, such as matrix multiplication or LU decomposition. Block-cyclic distribution techniques used in ScaLAPACK are no longer sufficient to balance the load among processors running at different speeds. The main result of this paper is to provide a static data distribution scheme that leads to an asymptotically perfect load balancing for LU decomposition, thereby providing solid foundations toward the design of a cluster-oriented version of ScaLAPACK.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Locality-Aware Work Stealing on Multi-CPU and Multi-GPU Architectures

Most recent HPC platforms have heterogeneous nodes composed of a combination of multi-core CPUs and accelerators, like GPUs. Scheduling on such architectures relies on a static partitioning and cost model. In this paper, we present a locality-aware work stealing scheduler for multi-CPU and multi-GPU architectures, which relies on the XKaapi runtime system. We show performance results on two den...

متن کامل

A group block distribution strategy for a heterogeneous machine

This paper discusses the data distribution problem for inherently sequential algorithms, such as the LU factorization in linear algebra, when computed on heterogeneous machines. These algorithms present additional difficulties to optimize the processing time due to the fact that the computational load for data matrix columns increases with their index, requiring a fine tuned load assignment and...

متن کامل

Special issue on parallel matrix algorithms and applications

This issue of the journal contains 11 articles selected from invited and contributed presentations made at the Workshop on Parallel Matrix Algorithms and Applications , which was held in Neuch^ a atel, Switzerland, on August 18–20, 2000. The workshop was well attended with participants from all over Europe and the United States. Papers presented at the workshop covered many aspects of parallel ...

متن کامل

Parallelization of the LU Decomposition on Heterogeneous Systems

With the appearance of GPUs as valid platforms, not only for graphics computation, but also general-purpose computations, applications that exploit hybrid/heterogeneous systems can be made available to the mass market due to the widespread availability of these systems. Correct distribution of the workload of these applications can lead way to significant performance boosts to complex applicati...

متن کامل

Performance Predictions of Multilevel Communication Optimal LU and QR Factorizations on Hierarchical Platforms

In this paper we study the performance of two classical dense linear algebra algorithms, the LU and the QR factorizations, on multilevel hierarchical platforms. We note that we focus on multilevel QR factorization, and give a brief description of the multilevel LU factorization. We first introduce a performance model called Hierarchical Cluster Platform (Hcp), encapsulating the characteristics ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • IJHPCA

دوره 15  شماره 

صفحات  -

تاریخ انتشار 2001